We can use R to look at the relationship between random variables. Here we will play with Anscombe’s data. First we import the data.
anscombe <- read.csv("anscombe.csv")
head(anscombe)
Base R has a function to plot a number of variables against each other.
pairs(anscombe)
We note that the first column contains plots of \(x\) against \(a\), \(b\), and \(c\) and that the fifth row of last column contains a plot of \(X\) against \(d\). The relationships between the various \(y\) values and \(x\) appear to be different.
A look at the covariances and correlations may be informative.
cov(anscombe)
## xabc ya yb yc yd xd
## xabc 11.000 5.5010000 5.50000000 5.49700 0.02000000 -4.400
## ya 5.501 4.1272691 3.09560909 1.93343 0.26806909 -2.003
## yb 5.500 3.0956091 4.12762909 2.42524 -0.05958091 -3.037
## yc 5.497 1.9334300 2.42524000 4.12262 0.09328000 -1.947
## yd 0.020 0.2680691 -0.05958091 0.09328 4.12324909 5.499
## xd -4.400 -2.0030000 -3.03700000 -1.94700 5.49900000 11.000
cor(anscombe)
## xabc ya yb yc yd xd
## xabc 1.000000000 0.81642052 0.81623651 0.81628674 0.002969709 -0.4000000
## ya 0.816420516 1.00000000 0.75000540 0.46871668 0.064982372 -0.2972715
## yb 0.816236506 0.75000540 1.00000000 0.58791933 -0.014442321 -0.4507110
## yc 0.816286739 0.46871668 0.58791933 1.00000000 0.022624662 -0.2891232
## yd 0.002969709 0.06498237 -0.01444232 0.02262466 1.000000000 0.8165214
## xd -0.400000000 -0.29727146 -0.45071096 -0.28912321 0.816521437 1.0000000
We note that the covariances and correlations for the appropriate \(x\) with the \(a\), \(b\), \(c\), and \(d\) \(y\) values are, to within rounding error, the same.
with(anscombe,{
c(
cor(xabc,ya),
cor(xabc,yb),
cor(xabc,yc),
cor(xd,yd)
)
})
## [1] 0.8164205 0.8162365 0.8162867 0.8165214